Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

Lightning-AI / pytorch-lightning Public

Notifications You must be signed in to change notification settings
Fork 3.4k
Star 28.7k

Code
Issues 838
Pull requests 60
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[Feat] Inter-batch-parellelism #8700

Closed

tchaton wants to merge 28 commits into master from inter_batch_parallism

Closed

[Feat] Inter-batch-parellelism #8700

tchaton wants to merge 28 commits into master from inter_batch_parallism

Conversation 11 Commits 28 Checks 0 Files changed

Conversation

Copy link

Contributor

tchaton commented Aug 3, 2021 •

edited

Loading

What does this PR do?

Fixes #8316

Does your PR introduce any breaking changes? If yes, please list them.

Before submitting

Was this discussed/approved via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

Did you have fun?

Make sure you had fun coding 🙃

Sorry, something went wrong.

All reactions


          update

241ad1f

Copy link

pep8speaks commented Aug 3, 2021 •

edited

Loading

Hello @tchaton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file tests/trainer/test_trainer.py:

Line 1950:19: E203 whitespace before ':'

Comment last updated at 2021-08-10 13:31:14 UTC

All reactions

Sorry, something went wrong.


          [pre-commit.ci] auto fixes from pre-commit.com hooks

cfe0c08

for more information, see https://pre-commit.ci

Copy link

codecov bot commented Aug 3, 2021 •

edited

Loading

Codecov Report

Merging #8700 (53eb4d7) into master (f1cc6e3) will decrease coverage by 49%.
The diff coverage is 32%.

@@           Coverage Diff            @@
##           master   #8700     +/-   ##
========================================
- Coverage      93%     43%    -49%     
========================================
  Files         169     170      +1     
  Lines       14068   14146     +78     
========================================
- Hits        13038    6138   -6900     
- Misses       1030    8008   +6978

All reactions

Sorry, something went wrong.

tchaton mentioned this pull request

Add a flavor of training_step that allows for expressing inter-batch parallelism #8316

Closed

tchaton and others added 11 commits

August 3, 2021 11:22


          update

132967d


          Merge branch 'inter_batch_parallism' of https://github.com/PyTorchLig…

45690e3

…htning/pytorch-lightning into inter_batch_parallism


          [pre-commit.ci] auto fixes from pre-commit.com hooks

aa1709e

for more information, see https://pre-commit.ci


          update

3cc770b


          Merge branch 'inter_batch_parallism' of https://github.com/PyTorchLig…

28c45ce

…htning/pytorch-lightning into inter_batch_parallism


          bad merge

67f034a


          [pre-commit.ci] auto fixes from pre-commit.com hooks

989b7cc

for more information, see https://pre-commit.ci


          [pre-commit.ci] auto fixes from pre-commit.com hooks

7e08c87

for more information, see https://pre-commit.ci


          improve test

ef34d31


          Merge branch 'inter_batch_parallism' of https://github.com/PyTorchLig…

1e6c977

…htning/pytorch-lightning into inter_batch_parallism


          [pre-commit.ci] auto fixes from pre-commit.com hooks

08f7860

for more information, see https://pre-commit.ci

kaushikb11 reviewed

View reviewed changes

pytorch_lightning/utilities/dataloader_fetcher.py Outdated Show resolved Hide resolved

kaushikb11 and others added 11 commits

August 4, 2021 10:53


          Raise exception for non GPUs

f0a89d3


          Code refactor

46bdb43


          Update test & dataloader fetcher

c14aad5


          Fix profiled iterator

8a4da32


          Attempt to fix the issue with hacky iterator

1c795f7


          Update LightningFetcher & use num_prefetch_batches instead

5a372e2


          Merge branch 'inter_batch_parallism' of https://github.com/PyTorchLig…

417e30a

…htning/pytorch-lightning into inter_batch_parallism


          Update evvent usage

a593b54


          Update tests

94eec7f


          Update defaults

b101cb0


          code health fix

c3578a5

tchaton mentioned this pull request

Add a flavor of training_step that takes dataloader_iter as an argument #8807

Merged

12 tasks


          Merge branch 'master' into inter_batch_parallism

e0c82b6

kaushikb11 added 3 commits

August 10, 2021 14:02


          Update last for LightningFetcher

ee96e37


          Merge branch 'inter_batch_parallism' of https://github.com/PyTorchLig…

4eca409

…htning/pytorch-lightning into inter_batch_parallism


          Update LightingFetcher

53eb4d7

ananthsub reviewed

View reviewed changes

pytorch_lightning/utilities/dataloader_fetcher.py

		return


		class LightningFetcher:

Copy link

Contributor

ananthsub Aug 12, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b question: certain classes are prefixed with Lightning while others aren't.
Components with name:

LightningModule, LightningDataModule, LightningLoggerBase
without: Trainer, Callbacks, Logger, Profiler

utilities like this don't contain logic specific to the rest of the framework, so I wonder if we could call this just Prefetcher?

it can also help users feel like they're not needing to learn lightning-specific things

Sorry, something went wrong.

tchaton reacted with heart emoji

All reactions

❤️ 1 reaction

Copy link

Contributor Author

tchaton Aug 16, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose DataFetcher.

Sorry, something went wrong.

All reactions

pytorch_lightning/utilities/dataloader_fetcher.py

+                          for _ in range(self.num_prefetch_batches + 1):
+                              if not done:
+                                  with torch.cuda.stream(cuda_stream):

Copy link

Contributor

ananthsub Aug 12, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this validate somewhere that torch.cuda is available?

Sorry, something went wrong.

All reactions

Copy link

Contributor

kaushikb11 Aug 12, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate it here as it is only supported on GPU devices https://github.com/PyTorchLightning/pytorch-lightning/blob/inter_batch_parallism/pytorch_lightning/trainer/trainer.py#L1344

Sorry, something went wrong.

All reactions

pytorch_lightning/utilities/dataloader_fetcher.py

+                      self,
+                      dataloader,
+                      batch_to_device: Callable,
+                      profiler: "pl.profiler.base.BaseProfiler",

Copy link

Contributor

ananthsub Aug 12, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the profiler be optional? the lightning trainer would always set this, but this becomes a handy utility for users outside the project too (it might even draw them into the project)

Sorry, something went wrong.

tchaton reacted with thumbs up emoji

All reactions

👍 1 reaction

pytorch_lightning/utilities/dataloader_fetcher.py

Comment on lines +34 to +35

		This class is used to perform ``pre-fetching`` for the ``train`` dataloader
		and apply inter batch parallelism if enabled.

Copy link

Contributor

ananthsub Aug 12, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could/should be used for evaluation too to still overlap the forward with host to device transfer

Sorry, something went wrong.

All reactions

Copy link

Contributor Author

tchaton Aug 16, 2021

There was a problem hiding this comment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added in fault tolerant training PRs :)

Sorry, something went wrong.

All reactions

tchaton closed this

Borda deleted the inter_batch_parallism branch

March 29, 2022 05:00

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

kaushikb11 kaushikb11 left review comments

ananthsub ananthsub left review comments

lantiga Awaiting requested review from lantiga lantiga will be requested when the pull request is marked ready for review lantiga is a code owner

Borda Awaiting requested review from Borda Borda will be requested when the pull request is marked ready for review Borda is a code owner

justusschock Awaiting requested review from justusschock justusschock will be requested when the pull request is marked ready for review justusschock is a code owner

ethanwharris Awaiting requested review from ethanwharris ethanwharris will be requested when the pull request is marked ready for review ethanwharris is a code owner

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Add a flavor of training_step that allows for expressing inter-batch parallelism

4 participants

Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.